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A new VLSI design of a pipeline Reed-Solomon decoder is presented . The transform 
decoding technique used in a previous article is replaced by a time domain algorithm 
through a detailed comparison of their VLSI implementations . A new architecture that 
implements the time domain algorithm permits efficient pipeline processing with reduced 
circuitry . Erasure correction capability is also incorporated with little additional complex- 
ity. By using a multiplexing technique , a new implementation of Euclid's algorithm 
maintains the throughput rate with less circuitry. Such improvements result in both 
enhanced capability and significant reduction in silicon area. 


I. Introduction 

Recently a VLSI design of a pipeline Reed-Solomon 
decoder was presented [1]. A modified form of Euclid’s 
algorithm was developed which avoided computations of 
inverse elements. A systolic array architecture was designed, 
from a suggestion by Brent and Kung [2] , to implement the 
modified Euclid’s algorithm. More recently, another VLSI 
design of an RS decoder was introduced [3] . It combined the 
algorithm in [4] and the modified Euclid’s algorithm instead 
of the continued fraction technique. The decoder design in 
[3] used a time domain decoding algorithm to reduce the 
massive circuitry required by the inverse transform in [1]. 
The decoder design also included the erasure correction 
capability, and, during the design process, a recursive architec- 
ture was derived to implement the modified Euclid’s algorithm 
by far fewer circuits than used in [1] . 


It has been pointed out [5] that the errata locator poly- 
nomial can be obtained directly from the Massey-Berlekemp 
algorithm if initialized properly. This suggestion led to 
improvements in the VLSI design in [3] . 

In this article, an efficient time domain RS decoding algo- 
rithm is described and verified. It is shown that the modified 
Euclid’s algorithm can produce the errata locator polynomial 
and errata evaluator polynomial simultaneously, similar to 
the Massey-Berlekemp algorithm. The VLSI architectures 
for syndrome computations, polynomial expansions, modi- 
fied Euclid’s algorithm performance, and polynomial evalua- 
tions are also described. 

This work was carried out during the architectural phase of 
the Advanced Reed-Solomon Decoder (ARSD) project and 
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should be viewed as a companion to the recent work of 
Truong, et al [6] . In that article, a transform domain decoder 
architecture is developed which, due to its design simplicity, 
has been chosen for the prototype VLSI implementation of 
the ARSD. However, the work presented here and in [6] 
clearly shows that the time domain architecture has many 
desirable features which make it an attractive candidate for 
future VLSI implementation. 

II. The Time Domain Reed-Solomon 
Decoding Algorithm 

Let N = 2 m - 1 be the length of the ( N,I ) RS code with 
design distance d. 

Let 

-*r l XU t 

1=0 

be the received message. Suppose e errors and E erasures 
occur, and 2 e + E <d - 1. Define A = {or i \r i declared as an 
erasure}. 

The decoding algorithm is as follows: 

Step 1 . Compute the syndromes 

1 

s* ■ E 

'-° x=<* k 

N-l 

= £ r.a ki forl<A;<d-l (1) 

i'=0 

Form a syndrome polynomial 

d - 1 

$W = £ S k xk ~ 1 (2) 

k= 1 

Step 2. Compute the erasure locator polynomial A(Z). Assume 
the erasure location information is received in the form of a 
binary sequence synchronous to the received message 

TV-1 

RW = £ r.X' 

l= 0 


Then for each symbol r t that is labeled as an erasure, a~ i 
should be the root of the erasure locator polynomial A(X). 
That is, 

MX) = n (*-o (3) 

a Z eA 

Step 3. Multiply the syndrome polynomial S(X) by the 
erasure locator polynomial A(X) to form the modified syn- 
drome polynomial 

T(X) = S(X) A(X) mod X d ~ l 

= 2 T k X k ~ l (4) 

= 1 

Step 4. If deg(A(X)) > deg(r(X)), then no error has occurred, 
i.e., e = 0. Thus there is no need to perform the modified 
Euclid’s algorithm. Let the errata locator polynomial o(X) = 
A(X) and the errata evaluator polynomial o>(X) = T{X). If 
deg(A(2Q) < deg(r(20), then perform a modified Euclid’s 
algorithm on X d ~ l and T{X) with the following initializations: 

M 0 W = AW * 0 W = * d “ 1 

(5) 

\{X) = 0 Q 0 W = T(X) 

Compute the following iterations: 


R, W 

= K_i V 1 R I- 1 W + W1 







(6) 

\-W 

= tVi b i- 1 \-i W + o M a._ x M,._ x W1 







(7) 

Q , ( W 


(8) 

M,W 


(9) 
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deg(co(A0) < e + E 


(14) 


where a t __ x and b ( _ x are the leading coefficients of R ( _ x (X) and 
Q m (AO, respectively, 

Vi =deg(/? / _ 1 W)-deg(2._ 1 W) 
and 

• 0 if Vi <0 (10) 

Stop the iterations when deg^^AQ) > deg(R,.(A0). Let the 
errata locator polynomial a(A0 = \(X) and the errata evalua- 
tor polynomial cj(X) = R f (X). The o(X) and co(AQ polyno- 
mials, obtained by the modified Euclid’s algorithm, both 
carry a common scale factor compared to those computed by 
the conventional Euclid’s algorithm. But this scale factor does 
not affect the errata location computations or the errata 
magnitude computations. 

Step 5. Evaluate the errata locator polynomial o(X) for a -1 , 
/ = 0, . . . , N - 1 to find the roots of o(X). If a(c*“ *') = 0, 
then r ( is a corrupted symbol. 

Step 6. Compute the corresponding errata magnitudes by 
evaluating to(AQ and a' (AO for or*, i = 0 , . . . ,7V - 1. That is, 
the errata magnitude 

e.= -- 0<i<N-l (11) 

‘ o'CO 

Note that the scale factor carried by co(A0 and a(A0 is auto- 
matically cancelled by this division. 

Step 7. Subtracting e t from r { yields the decoded codeword 

C. = r. -7. (12) 

iii v 7 

Note that the modified Euclid’s algorithm in Step 4 is a 
combination of three techniques. First, observe that the error 
locator polynomial X(AQ and the errata evaluator polynomial 
cu(AQ can be obtained from Euclid’s algorithm by computing 
the GCD of the modified syndrome T(X) and X d ~ x with the 


following initializations: 


M 0 (X) = 1 

*o<X) = 

(13) 

A 0 (X) = 0 

Qq (X) = T(X) 

Since e errors and E erasures occur and 2e + E < d - 1 , as in 
Theorem 8.4 of [7] , the following properties hold: 


deg(X(AQ) = e 


GCD (A(l),w(fl) 

= 1 

(15) 



(16) 

, - 

' [Awxwr 

X=a~ i 

\(X) A(X)S(X) = oj(X) mod X d ~‘ 

(17) 


Applying properties (14) and (17) to Theorem 8.5 of [7] 
implies that there exist a unique j and a unique polynomial 
j3(Af) such that 

X(AQ - p(X) X.(AQ 
co(X) = f$(X) R.(X) 

By properties (15) and (16), j3(AQ is a constant, which can be 
taken to be unity without affecting the roots of X(AQ or the 
magnitudes e r The second technique applied to the modified 
Euclid’s algorithm is that the errata locator polynomial o{X) - 
A(Af) X(AQ can be obtained directly from the Euclid’s algo- 
rithm. To achieve this, must be initialized to be the 

erasure locator polynomial A(AQ instead of 1 , and the iteration 
stop criterion must be changed to deg(^.(A0) < deg(X / (A r )). 
Such a change simply results in all \(X) carrying the factor 
A(AQ. The errata evaluator polynomial cj(AQ is not affected 
by such initialization because \-(AQ does not involve the 
computation of Rj(X). As will be shown later, using the 
modified Euclid’s algorithm to compute the errata locator 
polynomial directly eliminates the need for polynomial 
multiplication circuits and delay lines in a VLSI pipeline 
implementation. Thirdly, the modified Euclid’s algorithm 
uses cross multiplication and subtraction to replace polyno- 
mial division. Such operations eliminate the need to com- 
pute finite field inverse elements, which is performed by a 
table look-up, in this step. Since a look-up table involves 
the use of a large silicon area in VLSI, it is preferable to do 
this as infrequently as possible. 

Example . Consider an RS (8,4) code over GF (17) with 
generator polynomial g(X) = (X - 2) (X «2 2 )(Af-2 3 )(A r -2 4 ). 
Suppose two erasures and one error have occurred and the all 
zero codeword was sent. Let R(X) = -2X 5 - 3X 2 + 2X be 
the received vector with locations X s and X 2 flagged as 
erasures. Thus the erasure locator polynomial 

A(X) = (X -2~ S )(X -2~ 2 ) 

= X 2 + 13X+2 
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(1) Compute the syndromes 


5 fc = E r / 2 * k= 1.2, 3,4 

1=0 

= fl(2‘) = 13 

5 2 = R( 2 2 ) = 3 

5 3 = fl(2 3 ) = 10 

5 4 = R( 2 4 ) = 14 

Form the syndrome polynomial 

4 

5(20 = J2 S k Xk ~ l = S 4 X3 + 5 3 j2 + S 2^' +S l 

fc=l 

= 14* 3 + 10* 2 + 3* + 13 

(2) Compute the modified syndromes 

T(X) = S (X) A(*) mod X 4 

= (14* 3 + 10* 2 +3X+ 13) 

X (. X 2 + 13* + 2) mod* 4 

= 8* 3 +4* 2 +5* + 9 

Thus 

T 4 = S,T 3 = 4 ,T 2 = 5,7; =9 

(3) Perform the modified Euclid’s algorithm 

Po(*) = A(*) = * 2 + 13* + 2 
X 0 (*) = 0 
tf 0 (*) = * 4 

Go(*) = T(X) = 8*3 + 4* 2 + 5*+ 9 

/?,(*) = 8i? 0 (*)-*G 0 (*) 

= 8* 4 - * (8* 3 + 4* 2 + 5* + 9) 

= - 4* 3 - 5* 2 -9* 

\(X) = 8\(X)-Xu 0 (X) = - *(* 2 + 13* + 2) 


= -* 3 -13* 2 -2* 

G , (*) = G 0 (*) = 8* 3 + 4* 2 + 5 *+ 9 
Mj (*) = MW = * 2 + 13 * + 2 
R 2 (X) = 8R l (X)-(-4)Q l (X) 

= 8 (- 4* 3 - 5* 2 - 9 *) 

+ 4 ( 8* 3 + 4* 2 + 5 * + 9 ) 

= 10* 2 - * + 2 

^ (*) = 8 X , (*) - (- 4 )/ r , (*) = 8(* 3 - 1 3* 2 - 2 *) 
+ 4(* 2 + 13 *+ 2 ) 

= 9* 3 + 2* 2 + 2 * + 8 


Since deg(X ? (*)) - deg(/? 2 (*)) = 1 , Stop. 

Thus the errata evaluator is 

oo(*) = R 2 (X) = 10* 2 -*+ 2 

and the errata locator is 

o(*) = X^*) = 9* 3 + 2* 2 + 2*+ 8 

(4) Perform Chien search on o(*) and evaluate -w(*)/ 
o'(*) 


oil- 1 ) = 7; 
a(2“ 6 ) = 12; 


S = 0 


? 6 = 0 


oil- 5 ) = 0; 

a(2^) = 16; 

o(2 ~ 3 ) = 8; 


e. = -^!> = -2 

VO ~5 ) 


o'(2" 5 ) 


e 4 = 0 


e 3 =0 


oil' 2 ) = 0; 


o 2 = -^12)=-3 

2 a'(2- 2 ) 


oil ' 1 ) = 0 ; 

a (2 _0 ) = 4; 


_ «(2 1 ) _ 


e, = - 
1 


= 2 


0 (2“ ) 


e 0 = 0 
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(5) q = r § -e. 


i = 7, 6, 5, 4, 3, 2, 1,0 


( 20 ) 


= (0, 0, -2, 0, 0, -3, 2, 0) - (0, 0, -2, 0, 0, -3,2, 0) 

= (0,0, 0,0, 0,0, 0,0) 

The VLSI architecture of the pipeline RS decoder is shown 
in Fig. 1. The syndromes 5(20 are computed by a form of 
polynomial evaluation. The a k generation block converts binary 
erasure location information to powers of a which are the roots 
of the erasure locator polynomial. The modified syndromes 
T(X) and the erasure locator polynomial A(2Q can be com- 
puted by two polynomial multiplication circuits. By the use of 
a multiplexing and recursive technique, the modified Euclid's 
algorithm is implemented with a significant reduction of cells 
over a previous design [1]. The errata evaluator polynomial 
o o(X) and the errata locator polynomial o{X) are then evaluated 
using two polynomial evaluation circuits different from the 
one used for syndrome computation. The errata locations thus 
obtained direct the subtractions of the errata from the received 
messages to produce the decoded messages. In the following, 
the VLSI design of each functional block is described. 

III. VLSI Implementation of the 
Syndrome Computation 

The syndrome computation 
N - 1 

S k r. a ki Kk<d- 1 (18) 

/=0 

is an evaluation of a polynomial of length N on d - 1 points. 
Since N > d - 1 , it is best to compute all syndromes simulta- 
neously in the following manner as each r i is received: 

S k = (■■■(''N-i 01 '' +r N-2')°‘ k + • • + r i) ak+r 0 < 19 ) 

Note that r N _ { is the first received symbol. Starting from the 
innermost parentheses, syndrome S k is gradually computed as 
are received. After r 0 is entered, all d - 1 syndrome compu- 
tations are completed at the same time. They are ready to be 
shifted out serially at that point. A systolic array design of a 
syndrome computation circuit is shown in Fig. 2. 

IV. A VLSI Design for Polynomial 
Expansion 

Recall that A is the set of <*“' where a" z eA implies the 
location of r { is an erasure. The computation of the erasure 
locator polynomial A(2f) demands the expansion of 


A ( X ) = n (*-0 

a *e A 

from one root at a time. Similarly, the modified syndromes 
T(X) = S(X)A(X)modX d - 1 

= S(X) Id (X- or 1 ) mod X d ~ l (21) 

a *eA 

can also be computed in the same manner except T(X) uses 
5(20, instead of 1, as an initial condition. Therefore, a poly- 
nomial expansion circuit is developed to calculate T(X) and 
A(2Q. 

Note that for an arbitrary 5(2Q, which may be 1 , 

5(20 (2f - or*) = XS(X) - cT z 5(20 (72) 

This computation can be accomplished by a linear shift of 
5(20, multiplication of every coefficient of 5(20 by a _z , 
and finite field additions. A systolic array is designed, as 
shown in Fig. 3, to implement such simple operations. The 
control signal “zero” ensures that the resultant polynomial 
would not be changed if or* = 0. 

V. A New Architecture to Perform the 
Modified Euclidean Algorithm 

A systolic array was designed in [2] to compute the error 
locator polynomial by a modified Euclidean algorithm. The 
array required 2 1 cells, twice the number of correctable 
errors. It is capable of performing the modified Euclidean 
algorithm continuously. 

In the modified Euclidean algorithm only one syndrome 
polynomial is computed in the time interval of one code 
word. As a consequence, the original architecture in [2] of 
a pipeline RS decoder is not as efficient as it might be. A 
substantial portion of the systolic array is always idling. This 
fact makes possible a more efficient design with fewer cells 
and no loss in the throughput rate. ^ 

For the (TV, I) RS code the length of the syndrome poly- 
nomial is N - I . The maximum length of the resultant Forney 
syndrome polynomial is also N - I . Imagine now that a single 
cell is used recursively to perform the successive steps of the 
modified Euclidean algorithm instead of pipelining data to 
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the next cell. Then it would take TV -I recursions to complete 
the algorithm, where each recursion requires TV ~ / symbol 
times. Therefore, using a single cell recursively requires only a 
total of (TV - 1) 2 symbol time to complete the modified form 
of Euclidean algorithm. Since a syndrome polynomial needs 
to arrive every TV symbol times, only [_(TV - /) 2 / TVJ cells are 
needed to process successive syndrome polynomials at a full 
pipeline throughput rate. 

Figure 4 shows the new alternate architectural design. The 
input multiplexer directs the syndrome polynomials to differ- 
ent cells. Each processor cell is almost identical to the cell 
presented in [2] , except that it is used to process data recur- 
sively. 


A r - 1 

E 'X 

i-O 


X=ot* 


fox l<k<d-\ (26) 


However, the syndrome computation is an evaluation of a 
polynomial of length TV on c? - 1 points and both a (20 and 
go (20, having length <e+2T+l<d-l,are evaluated onN 
points. If one evaluates a (20 or go (20 using the design in 
Section III for syndrome computation, it would take TV of 
these cells. Since TV > d - 1, there is a more efficient design 
which uses only d - 1 cells with less complexity. 

Consider evaluating a polynomial A(X ), deg(/l(20) < 
d - 2 


The architecture of the new basic cell is given in Fig. 5. 
Compared with the previous systolic array design [2] , the 
present scheme for multiplexing the recursive cell computa- 
tions significantly reduces the number of cells and as a conse- 
quence the number of circuits. Table 1 shows that the cell 
reduction is greater for high rate codes. 


VI. A VLSI Design of a Polynomial 
Evaluation Circuit 

In RS decoding the errata locator polynomial 


e+E 

a(X) = £ a.X‘ 
1=0 


(23) 


its derivative 


e+E 

</(r> = Y a i xi ' 1 < 24) 

1=0 
2 a 

and the errata evaluator polynomial 

e+E - 1 

cj(X) = 2 °>i X ‘ ( 25 ) 

1=0 


all need to be evaluated for each a~ z , 1 < TV. Note that the 
syndrome computation is another form of evaluating the 
received message polynomial/? (A"): 


= R(X) 


d-2 

.4 00 = £ a t X l (27) 

/=0 


1,2,..., W. 


d-2 

= y a i a-// 

/=o 

d-2 

= Y a i a ~*' for / = 1,2, ... ,N (28) 
1=0 

For each the quantity can obtained by recur- 

sively multiplying a fixed constant a i as / goes from 1 to TV. 

A finite field summation of d - 1 terms results in the desired 
polynomial evaluation. A systolic array design of such an 
operation is shown in Fig. 6. Note that the results of evaluat- 
ing a (2Q, a' (20, and co(2f) are produced sequentially. This 
matches perfectly with the sequential nature of the received 
data/? (20 in a real-time decoding environment. 

One last observation on the polynomial evaluation: the 
evaluation of o' (X) uses only the coefficients of a (20 with 
odd power terms. This property makes it possible to obtain 
the evaluation of a' (20 as a by-product from the evaluation 
of a (20 at no cost. As illustrated in Fig. 7, simply use two 
smaller exclusive-OR trees to sum the even terms and odd 
terms of a(20 separately. The summation of the odd terms 
yields a'(0E“*)- Another exclusive-OR operation on the two 
partial sums results in a(a” z ) itself. 


for 2T = / = 

Hence, 


A(X) 
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Table 1. Comparison of the number of cells required in the 
modified Euclid’s algorithm computation 


RS code 

Full systolic array 

Multiplexing on recursive cells 

(15,9) 

6 

3 

(31, 15) 

16 

9 

(255,223) 

32 

5 
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Fig. 4. The new architecture for performing the modified form of Euclid’s algorithm 



Fig. 5. Block diagram of basic cell for computing the 
modified Euclid’s algorithm 
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Fig. 6. A systolic array for polynomial evaluation 



Fig. 7. The polynomial evaluation circuit for cr(X) and o-'(X) 
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